Add support for automatic naming of random variables #6364

zaxtax · 2022-12-03T05:22:36Z

This PR introduces a way to use distributions without needing to provide a name.

This means:

import pymc as pm
with pm.Model():
    x = pm.Normal(0., 1.)

is equivalent to

import pymc as pm
with pm.Model():
    x = pm.Normal("x", 0., 1.)

I use some code originally written by @Tobias-Kohn for pyprob. The implementation is done by inspecting stack frames to figure out what the assignment will be. It picks earliest point an assignment can take place.

This PR is intended to be fully backwards-compatible.

Checklist

Explain important implementation details 👆
Make sure that the pre-commit linting/style checks pass.
Are the changes covered by tests and docstrings?
Fill out the short summary sections 👇

codecov · 2022-12-03T05:33:18Z

Codecov Report

Merging #6364 (97b1f58) into main (5688555) will decrease coverage by 0.02%.
Report is 434 commits behind head on main.
The diff coverage is 84.31%.

❗ Current head 97b1f58 differs from pull request most recent head e835fa0. Consider uploading reports for the commit e835fa0 to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #6364      +/-   ##
==========================================
- Coverage   94.72%   94.71%   -0.02%     
==========================================
  Files         132      132              
  Lines       26740    26789      +49     
==========================================
+ Hits        25330    25372      +42     
- Misses       1410     1417       +7

Files	Coverage Δ
pymc/tests/distributions/test_distribution.py	`98.31% <100.00%> (+0.09%)`	⬆️
pymc/distributions/distribution.py	`93.18% <78.94%> (-2.00%)`	⬇️

ricardoV94

Pretty wild 😜 This will require group approval for sure but could make it to PyMC5 if it goes through.

Does it handle assignment in list comprehension?

ricardoV94 · 2022-12-03T05:32:48Z

pymc/distributions/distribution.py

+            value = frame.f_locals[var_name]
+        else:
+            value = None
+        if type(value) is int:


Is this for container[i] = pm.Normal(0, 1)?

Missing a test if that's the case.

I haven't tested assignment in a list comprehension. Can you include a short example?

Lots of the complexity in _extract_target_of_assignment is for supporting a wider variety of things you can assign to like container[i].

Here I was asking what this integer case was for.

For list comprehension I was thinking normals = [pm.Normal() for i in range(5)]

Not saying we need to support it, just asking if it would be possible.

The integer case is for assigning to tuples I believe. The list comprehension case does not work, but we could probably modify _extract_target_of_assignment to detect that situation.

I think we should hold off on supporting it. Generating unique names for comprehensions that are predictable to the user looks tricky to me.

The integer case is for assigning to tuples I believe.

Can you add a test?

pymc/distributions/distribution.py

pymc/tests/distributions/test_distribution.py

ricardoV94 · 2022-12-03T07:15:36Z

@pymc-devs/dev-core Weight in :)

twiecki · 2022-12-03T09:34:40Z

Amazing!

michaelosthege · 2022-12-03T10:28:45Z

I remember that I didn't like this pattern before, because it messes up the typing of the arguments =/

I know that the typing of distribution arguments isn't great in the first place, but we're also talking about refactoring all these to functions.
For functions we could have clean signatures and that would be a great improvement of UX too.

How would the signature of pm.Normal look like if it was a function?

Can this be done with overloads?

zaxtax · 2022-12-03T14:16:31Z

I remember that I didn't like this pattern before, because it messes up the typing of the arguments =/

I know that the typing of distribution arguments isn't great in the first place, but we're also talking about refactoring all these to functions. For functions we could have clean signatures and that would be a great improvement of UX too.

How would the signature of pm.Normal look like if it was a function?

Can this be done with overloads?

As you mentioned this doesn't change the typing too much as is. If we were to change the distributions to functions, I think this could be handled with an overload. One with a explicit name argument, and one without.

Most of the hackiness of this comes from trying to maintain full backwards-compatibility. If we were ok with an API break, this just gets typed with name becoming an optional argument.

michaelosthege · 2022-12-03T14:32:13Z

pymc/distributions/distribution.py

+            name = args[0]
+            args = args[1:]
+        else:
+            name = _extract_target_of_assignment(2)


For situations where automated name determination fails, this part should raise an informative error.
For example, one might simply forget to pass the name:

with pmodel: pm.Normal("Y", pm.Uniform(), observed=2)

There are exceptions a few lines downstream, but here we should do a try/except

or even better: make _extract_target_of_assignment raise the informative error and unit test that directly.

Good idea. What do you mean by an informative error message? I'm not sure what more can I say beyond that I failed to infer the name, and in the _extract_target_of_assignment that no assignment instruction was found at that frame depth.

Maybe something about "inferring the name from the outer code"

if you have a handle on the line of code print that

definitely instructions what do do about it: please pass a name as the first argument

Just pushed. Does that look ok?

the message looks good, but I'd recommend to raise it in the extraction function instead of returning None. It gives a cleaner signature (str or raise) and separation of responsibility..

I think the error message changes. Are we ok with an Assignment not found error that is raised, caught, and re-raised as a Name needs to be provided error message?

fonnesbeck · 2022-12-03T15:53:46Z

Very cool. We have been wanting to infer names since the days of PyMC2.

michaelosthege

@zaxtax something like this

or maybe keep the try except outside, but raise instead of returning none

michaelosthege · 2022-12-03T18:34:41Z

pymc/distributions/distribution.py

+        if "name" in kwargs:
+            name = kwargs.pop("name")
+        elif len(args) > 0 and isinstance(args[0], string_types):
+            name = args[0]
+            args = args[1:]
+        else:
+            name = _extract_target_of_assignment(2)
+            if name is None:
+                raise TypeError(
+                    "Name could not be inferred for variable from surrounding "
+                    "context. Pass a name explicitly as the first argument to "
+                    "the Distribution."
+                )
+
+        if not isinstance(name, string_types):
+            raise TypeError(f"Name needs to be a string but got: {name}")
+


Suggested change

if "name" in kwargs:

name = kwargs.pop("name")

elif len(args) > 0 and isinstance(args[0], string_types):

name = args[0]

args = args[1:]

else:

name = _extract_target_of_assignment(2)

if name is None:

raise TypeError(

"Name could not be inferred for variable from surrounding "

"context. Pass a name explicitly as the first argument to "

"the Distribution."

)

if not isinstance(name, string_types):

raise TypeError(f"Name needs to be a string but got: {name}")

name = kwargs.pop("name", None)

if name is None:

if args and isinstance(args[0], string_types):

# Name was provided as the first argument

name = args[0]

args = args[1:]

else:

# Try to infer name from the outer context.

# This may raise an error, but then there's nothing else we can do.

name = _extract_target_of_assignment(2)

if not isinstance(name, string_types):

raise TypeError(f"Name needs to be a string but got: {name}")

michaelosthege · 2022-12-03T18:39:10Z

pymc/distributions/distribution.py

+# Helper function from pyprob
+def _extract_target_of_assignment(depth):
+    frame = sys._getframe(depth)
+    code = frame.f_code
+    next_instruction = code.co_code[frame.f_lasti + 2]
+    instruction_arg = code.co_code[frame.f_lasti + 3]
+    instruction_name = opcode.opname[next_instruction]
+    if instruction_name == "STORE_FAST":
+        return code.co_varnames[instruction_arg]
+    elif instruction_name in ["STORE_NAME", "STORE_GLOBAL"]:
+        return code.co_names[instruction_arg]
+    elif (
+        instruction_name in ["LOAD_FAST", "LOAD_NAME", "LOAD_GLOBAL"]
+        and opcode.opname[code.co_code[frame.f_lasti + 4]] in ["LOAD_CONST", "LOAD_FAST"]
+        and opcode.opname[code.co_code[frame.f_lasti + 6]] == "STORE_SUBSCR"
+    ):
+        if instruction_name == "LOAD_FAST":
+            base_name = code.co_varnames[instruction_arg]
+        else:
+            base_name = code.co_names[instruction_arg]
+
+        second_instruction = opcode.opname[code.co_code[frame.f_lasti + 4]]
+        second_arg = code.co_code[frame.f_lasti + 5]
+        if second_instruction == "LOAD_CONST":
+            value = code.co_consts[second_arg]
+        elif second_instruction == "LOAD_FAST":
+            var_name = code.co_varnames[second_arg]
+            value = frame.f_locals[var_name]
+        else:
+            value = None
+        if value is not None:
+            index_name = repr(value)
+            return base_name + "[" + index_name + "]"
+        else:
+            return None
+    else:
+        return None


Suggested change

# Helper function from pyprob

def _extract_target_of_assignment(depth):

frame = sys._getframe(depth)

code = frame.f_code

next_instruction = code.co_code[frame.f_lasti + 2]

instruction_arg = code.co_code[frame.f_lasti + 3]

instruction_name = opcode.opname[next_instruction]

if instruction_name == "STORE_FAST":

return code.co_varnames[instruction_arg]

elif instruction_name in ["STORE_NAME", "STORE_GLOBAL"]:

return code.co_names[instruction_arg]

elif (

instruction_name in ["LOAD_FAST", "LOAD_NAME", "LOAD_GLOBAL"]

and opcode.opname[code.co_code[frame.f_lasti + 4]] in ["LOAD_CONST", "LOAD_FAST"]

and opcode.opname[code.co_code[frame.f_lasti + 6]] == "STORE_SUBSCR"

):

if instruction_name == "LOAD_FAST":

base_name = code.co_varnames[instruction_arg]

else:

base_name = code.co_names[instruction_arg]

second_instruction = opcode.opname[code.co_code[frame.f_lasti + 4]]

second_arg = code.co_code[frame.f_lasti + 5]

if second_instruction == "LOAD_CONST":

value = code.co_consts[second_arg]

elif second_instruction == "LOAD_FAST":

var_name = code.co_varnames[second_arg]

value = frame.f_locals[var_name]

else:

value = None

if value is not None:

index_name = repr(value)

return base_name + "[" + index_name + "]"

else:

return None

else:

return None

def _extract_target_of_assignment(depth) -> str:

"""Helper function to infer RV names from outer code.

Adapted from pyprob.

"""

try:

frame = sys._getframe(depth)

code = frame.f_code

next_instruction = code.co_code[frame.f_lasti + 2]

instruction_arg = code.co_code[frame.f_lasti + 3]

instruction_name = opcode.opname[next_instruction]

if instruction_name == "STORE_FAST":

return code.co_varnames[instruction_arg]

elif instruction_name in ["STORE_NAME", "STORE_GLOBAL"]:

return code.co_names[instruction_arg]

elif (

instruction_name in ["LOAD_FAST", "LOAD_NAME", "LOAD_GLOBAL"]

and opcode.opname[code.co_code[frame.f_lasti + 4]] in ["LOAD_CONST", "LOAD_FAST"]

and opcode.opname[code.co_code[frame.f_lasti + 6]] == "STORE_SUBSCR"

):

if instruction_name == "LOAD_FAST":

base_name = code.co_varnames[instruction_arg]

else:

base_name = code.co_names[instruction_arg]

second_instruction = opcode.opname[code.co_code[frame.f_lasti + 4]]

second_arg = code.co_code[frame.f_lasti + 5]

if second_instruction == "LOAD_CONST":

value = code.co_consts[second_arg]

elif second_instruction == "LOAD_FAST":

var_name = code.co_varnames[second_arg]

value = frame.f_locals[var_name]

else:

value = None

if value is not None:

index_name = repr(value)

return base_name + "[" + index_name + "]"

else:

raise Exception()

else:

raise Exception()

except Exception as ex:

raise TypeError(

"Name could not be inferred for variable from surrounding "

"context. Pass a name explicitly as the first argument to "

"the Distribution."

) from ex

Thanks that clarifies things!

aseyboldt · 2022-12-04T01:42:06Z

This is a fun and interesting idea, and I really enjoy the inventiveness of the implementation!

However, I'm not sure I really would want this to be in pymc. The main assumption of this is that people want to use the same name for a variable binding as for the (global) name of the variable in the trace. I just went through a couple of my models, and at least for me that just isn't the case. Less than half of the variables had the same name as the binding, especially for larger models, where variable names tend to get longer because they have to stay globally unique, and part of the model creation is in separate functions with local namespaces.
I had a quick look at models on discourse, and there I think the percentage looks like more than half, but still very far from 100%. There are lots of models that use different names for the local bindings, and I think often for good reason.

Also from a more abstract perspective this seems a bit unintuitive: the global name of a variable and the name of the variable binding are quite different things, and if we did this I think pymc would be the only library I know that uses binding names internally. It also goes very much against general rules in python, where the code where a function is called doesn't matter inside the call. It also adds a lot of strangeness in the argument order of the RV constructors, where we need quite a bit of logic to figure out if there is a name or not.

This seems to me to trade the small inconvenience of repeating a name from time to time for unique strangeness that goes against how python usually works at its core. That doesn't sound like a good tradeoff to me...

If we want to improve variable naming, I think it would be more interesting to think about namespaces. I've been waiting for xarray to eventually support nesting, if that were to happen I think that would help greatly

ricardoV94 · 2022-12-04T05:13:00Z

How does this fare with Distribution constructor helpers such as ZeroInflatedPoisson, OrderedLogistic, LKJCholeskyCov, which are not distributions themselves?

Users should expect the same behavior but I don't want to make these cumbersome to write for developers.

ricardoV94 · 2022-12-04T05:16:26Z

Less than half of the variables had the same name as the binding, especially for larger models, where variable names tend to get longer because they have to stay globally unique, and part of the model creation is in separate functions with local namespaces.

This won't prevent users from using names, so I don't see how this is an argument against. Of course users didn't bother to give unique assignment names in the past, it didn't matter. Obviously that changes if you are using assignment to provide the name. If you repeat it you'll get the usual repeated name error from the Model.

What do you mean with local namespaces?

aseyboldt · 2022-12-04T05:40:13Z

This won't prevent users from using names, so I don't see how this is an argument against. Of course users didn't bother to give unique assignment names in the past, it didn't matter. Obviously that changes if you are using assignment to provide the name. If you repeat it you'll get the usual repeated name error from the Model.

True, it still is possible to specify the name. But this seems like a pretty big change to me, after all with this we are effectively changing how a call in python works. This might be a good idea if that is almost always what we want, but if it isn't, do we really want to make such a deep and unexpected change? Just imaging the confusion of someone familiar with python but not pymc who is reading a model. Somehow there are names in the trace, but those names were never actually passed into any function anywhere! Or someone might rename a local variable (which never changes behavior in any python code I've seen so far), and then some distant code that analyses the trace breaks.

What do you mean with local namespaces?

I mean things like ``` def make_spacial_compontent(): mu = pm.Normal("spacial_mu") sigma = pm.HalfNormal("spacial_sigma") ... ``` So the global variable names are more complicated, but because we only ever see part of that in the function we use other variable names there.

ricardoV94 · 2022-12-04T05:45:07Z

What do you mean with local namespaces?

I mean things like
def make_spacial_compontent():
 mu = pm.Normal("spacial_mu") 
 sigma = pm.HalfNormal("spacial_sigma") 
 ... 
So the global variable names are more complicated, but because we only ever see part of that in the function we use other variable names there.

If you pass a name to the model you get it as the prefix of every variable, so you could use a nested model for that already with this PR

Edit: Still I don't see the issue as long as you can pass names explicitly.

twiecki · 2022-12-04T12:21:21Z

@aseyboldt I see your points, but I don't think you are the target audience for this feature. For me, this is really focused at beginners, and to them it's often not clear why they have to repeat things. Further optimizing the simplicity and readability of the HelloWorld model is pretty important I think. Intermediate and advanced users will probably keep using names.

ricardoV94

How does this fare with Distribution constructor helpers such as ZeroInflatedPoisson, OrderedLogistic, LKJCholeskyCov, which are not distributions themselves?

Users should expect the same behavior but I don't want to make these cumbersome to write for developers.

As expected... these type of classes would have to be tweaked in this PR.

import pymc as pm
with pm.Model() as m:
    x = pm.ZeroInflatedPoisson(psi=0.5, mu=9)

# TypeError: ZeroInflatedPoisson.__new__() missing 1 required positional argument: 'name'

zaxtax · 2022-12-04T16:28:16Z

This is a fun and interesting idea, and I really enjoy the inventiveness of the implementation!

However, I'm not sure I really would want this to be in pymc. The main assumption of this is that people want to use the same name for a variable binding as for the (global) name of the variable in the trace. I just went through a couple of my models, and at least for me that just isn't the case. Less than half of the variables had the same name as the binding, especially for larger models, where variable names tend to get longer because they have to stay globally unique, and part of the model creation is in separate functions with local namespaces. I had a quick look at models on discourse, and there I think the percentage looks like more than half, but still very far from 100%. There are lots of models that use different names for the local bindings, and I think often for good reason.

Also from a more abstract perspective this seems a bit unintuitive: the global name of a variable and the name of the variable binding are quite different things, and if we did this I think pymc would be the only library I know that uses binding names internally. It also goes very much against general rules in python, where the code where a function is called doesn't matter inside the call. It also adds a lot of strangeness in the argument order of the RV constructors, where we need quite a bit of logic to figure out if there is a name or not.

This seems to me to trade the small inconvenience of repeating a name from time to time for unique strangeness that goes against how python usually works at its core. That doesn't sound like a good tradeoff to me...

If we want to improve variable naming, I think it would be more interesting to think about namespaces. I've been waiting for xarray to eventually support nesting, if that were to happen I think that would help greatly

Thanks @aseyboldt for your thoughtful feedback. I think these are valid points and I'm not even sure beginners mind having to label their random variables.

As I mentioned to Michael, much of the hackiness of this originates from trying to maintain full backwards-compatibility. Right now we have lots of user code that treats the name as the first argument. The metaprogramming bits though while very odd are no more odd than what happens with Model where we enter a context and now magically an object is modified behind the scenes. A small method that uses the surrounding scope seems like a more isolated form of magic.

There are also larger design questions that maybe could get us to the same destination but using a more elegant route. I'm not sure we need names for the random variables. They seem to me to be mostly needed for visualisation and diagnostic plots.

ricardoV94 · 2022-12-04T16:37:12Z

I'm not sure we need names for the random variables.

We need names to link variables to sampling results. Otherwise you don't know what is what in the results of prior predictive and sample. Posterior predictive, likewise, relies on names to know what's provided and what needs to be resampled.

Storing and loading results with unnamed TensorVariables as the keys would be a mess.

zaxtax · 2022-12-04T17:04:02Z

Storing and loading results with unnamed TensorVariables as the keys would be a mess.

I think there is a small mess that comes from needing to hold onto the TensorVariables to use as keys. The big mess comes when you need to give them names to be plotted.

This comes up in other PPLs like BeanMachine where random variables are decorated functions. The name became the function's name concatenated with its arguments. Which leads to plots that usually don't look the cleanest.

drbenvincent · 2022-12-04T18:04:39Z

Will this work for unicode variable names? People have a habit of using unicode mu, sigma, beta, etc. If yes, it's worth adding some tests for that.

PS. I think this is great. Think it will allow for much easier on-ramping of beginners.

zaxtax · 2022-12-04T18:20:01Z

Will this work for unicode variable names? People have a habit of using unicode mu, sigma, beta, etc. If yes, it's worth adding some tests for that.

PS. I think this is great. Think it will allow for much easier on-ramping of beginners.

Seems to work with unicode characters

twiecki · 2022-12-05T09:38:38Z

As Python has full unicode support I would expect it to just work, I don't think we need a test for that.

ccaprani · 2022-12-05T09:54:45Z

pymc/distributions/distribution.py

@@ -164,6 +165,45 @@ def fn(*args, **kwargs):
    return fn


+# Helper function from pyprob


URL to original commit on pyprob: pyprob/pyprob@6aac747

fonnesbeck · 2022-12-05T18:31:54Z

Manual naming would have to be retained in order to support things like re-usable model components. For example,

    def component_submodel(name, ...):

        # Aging curve
        c_age = pm.HalfNormal(f"c_age_{name}", 10)
        offset_age = pm.HalfNormal(f"offset_age_{name}", 1)

        ...

where the model instantiates several component_submodels that each have their own instances of all of the random variables created inside the function. There does not appear to be an automatic way of naming these.

aseyboldt · 2022-12-05T20:02:44Z

Well, it seems like I'm the only one to who this looks like too much magic, if that's the case then I guess it's fine. :-)
I'll just argue a little bit more, to at least make sure we don't miss anything...

I guess the reason why this feels like too much magic to me is that this API is not really part of the python language: If you only use API that is not CPython specific, you can not (as far as I know) implement this API at all. In order to implement it, we have to use internal data of CPython that isn't really meant to be used when programming in the language, and that other Python implementations (or future versions of CPython) might not even have. To quote from the CPython docs about some of those functions:

https://docs.python.org/3/library/sys.html#sys._getframe

CPython implementation detail: This function should be used for internal and specialized purposes only. It is not guaranteed to exist in all implementations of Python.

https://docs.python.org/3/reference/datamodel.html#the-standard-type-hierarchy (for the frame attributes)

Internal types A few types used internally by the interpreter are exposed to the user. Their definitions may change with future versions of the interpreter, but they are mentioned here for completeness.

https://docs.python.org/3/library/dis.html?highlight=dis#module-dis

CPython implementation detail: Bytecode is an implementation detail of the CPython interpreter. No guarantees are made that bytecode will not be added, removed, or changed between versions of Python. Use of this module should not be considered to work across Python VMs or Python releases.

At least bytecode also has changed quite a bit over the years as far as I am aware.

This is in contrast to something like the model context, that's just a glorified global variable, and only uses quite simple features of the python language.

I think it is fair to say that an API that can not be implemented in the language is a whole different level of magic. :-)

Given that I don't think it is actually that helpful or at least important, (and @twiecki I think breaking rules of the language is actually worse for beginners than for people who have worked with that for a long time), it just doesn't feel right to me.
Answering the question "Why do I have to reapeat the name" seems quite easy to me: "You are not really repeating anything, the string argument is the name in the trace, the name of the assignment is the local binding of that variable. Those two things don't really have anything to do with each other, only that maybe in some cases you might want to choose the same name.")

This also raises some compatibility questions that I think we really should investigate at least a bit before we commit to anything like this: It is quite possible that this can't actually be implemented at all in other python implementations, including future CPython versions. I had a quick look and maybe pypy would work, but I'm not sure. We might also want to look at pyston, because I think there is some talk of using similar jit ideas in future CPython versions. I don't think it is particularly likely, but it really would be unfortunate if we go for an API like this, and a few CPython versions later this suddenly can't be done anymore, or it can only be done in a different subset of cases.

twiecki · 2022-12-05T20:19:19Z

Well, damn, you raise some good points @aseyboldt, need to let that sink in a bit more. One nit: "I think it is fair to say that an API that can not be implemented in the language is a whole different level of magic. :-)" I wouldn't say it's the API per se, because it's still just a function call, but rather the code implementing the API that's not Python.

michaelosthege · 2022-12-05T22:14:18Z

Adrian isn't the only one with concerns.

If we do this, would we update the documentation to use this feature? All of it? Where would we draw the line?

It's clearly violating the "explicit is better than implicit" rule, and because all of it is happening behind the scenes, it may create more confusion than benefits. After all, the usual API of passing names may be verbose, but it's definitely easy to understand. In particular for examples that opt for informative RV names and short variable names (s = pm.Normal("slope")).

In the end I think its about trading off slope in the very early vs. medium stage learning curve. (Seemingly easier Quickstart examples at the cost of confusion about moderately complex models.)

twiecki · 2022-12-05T22:25:06Z

It's also true that after the first slightly negative reaction, people get over it pretty quickly.

…

On Mon, Dec 5, 2022, 23:14 Michael Osthege ***@***.***> wrote: Adrian isn't the only one with concerns. *If* we do this, would we update the documentation to use this feature? All of it? *Where would we draw the line?* It's clearly violating the "explicit is better than implicit" rule, and because all of it is happening behind the scenes, it may create more confusion than benefits. After all, the usual API of passing names may be verbose, but it's definitely easy to understand. In particular for examples that opt for informative RV names and short variable names (x = pm.Data("time", [1,2,3])). In the end I think its about trading off slope in the very early vs. medium stage learning curve. (Seemingly easier Quickstart examples at the cost of confusion about moderately complex models.) — Reply to this email directly, view it on GitHub <#6364 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAFETGGDLGN3WOTMAZSCBUDWLZSMJANCNFSM6AAAAAASSRXDM4> . You are receiving this because you were mentioned.Message ID: ***@***.***>

drbenvincent · 2022-12-06T07:14:50Z

@aseyboldt makes some very good points. While I think having the option would be great, it sounds like it's not really an option. Maybe we can better sell why the current apparently verbose approach is beneficial.

ricardoV94 · 2022-12-06T07:44:53Z

I don't think there is any benefit to the current approach, it's just the best we could do within Python's limitations (other than going the JAGS way, and defining models as large strings :D). You have to be verbose. Making that optional would be great.

I agree that this is very inelegant, because we're trying to break out of Python's language. Still I think it's fine to exploit features only available in the CPython implementation. It's not like we are testing or explicitly supporting any other Python implementations.

The strongest argument against this approach is that the byte-code can be changed at any time rendering this approach impossible / cumbersome in the future. It would suck to have to backtrack.

fonnesbeck · 2022-12-06T15:27:58Z

I don't have the usual concerns with "too much magic" or being implicit rather than explicit in this case, because ultimately we are talking about naming variables here. If it saves users time and/or makes the model code more concise, then there is not really any downside to me, as long as the approach is robust and allows for corner cases to be accommodated. Moreover, as long as we retain the option to manually override with the name argument, then it should be backward-compatible.

zaxtax · 2022-12-09T13:26:10Z

I don't think there is any benefit to the current approach, it's just the best we could do within Python's limitations (other than going the JAGS way, and defining models as large strings :D). You have to be verbose. Making that optional would be great.

I agree that this is very inelegant, because we're trying to break out of Python's language. Still I think it's fine to exploit features only available in the CPython implementation. It's not like we are testing or explicitly supporting any other Python implementations.

The strongest argument against this approach is that the byte-code can be changed at any time rendering this approach impossible / cumbersome in the future. It would suck to have to backtrack.

In terms of the bytecode, while it can and does change a little every version release, this code seems to work across many version releases even going as far back as python 2.x

twiecki · 2022-12-09T13:31:13Z

In terms of the bytecode, while it can and does change a little every version release, this code seems to work across many version releases even going as far back as python 2.x

That makes me feel better.

lucianopaz · 2022-12-23T09:06:00Z

Could you also add a test where the model has a name? To me, it looks like the variable autoname will still have the model name prepended to it, but I think that we should explicitly test that it does.

ricardoV94 · 2023-06-16T07:07:02Z

Closing this as the reception was pretty mixed. Thanks anyway @zaxtax!

ricardoV94 · 2023-06-16T09:25:02Z

Reopened, the floor is yours again

ferrine · 2023-06-22T20:50:36Z

I went through the thread and I also feel that this is too much magic for beginners. I agree that this magic helper reduces code for just a little bit, but it creates more trouble than good.

The support of magic naming is not uniform and we put a user into a situation where user has to use or not to use magic naming based on the usage context.
It creates troubles under refactoring
Forward compatibility is questionable and we are about to step onto the ground if python_version == "3.15":... which does not sound fun and creates serious issues if this risk comes real
s = pm.Normal("slope") is a realistic usecase as mentioned by @michaelosthege and a user has to think twice between long vs short naming since it is not only affects trace analysis but also very near context where this slope has to be used

Sudden refactorings and iterations over model create issues back and forth, you refactor little scope code and it affects late visualizations. The modelling is not 1-shot, a thoughtful user iterates over the model and the easier it is the better. The verbose approach saves from thinking twice about naming and also saves from opportunistic coding but unexpected errors

zaxtax · 2023-06-22T21:17:23Z

I went through the thread and I also feel that this is too much magic for beginners. I agree that this magic helper reduces code for just a little bit, but it creates more trouble than good.
* The support of magic naming is not uniform and we put a user into a situation where user has to use or not to use magic naming based on the usage context.

* It creates troubles under refactoring

* Forward compatibility is questionable and we are about to step onto the ground `if python_version == "3.15":...` which does not sound fun and creates serious issues if this risk comes real

* `s = pm.Normal("slope")` is a realistic usecase as mentioned by @michaelosthege and a user has to think twice between long vs short naming since it is not only affects trace analysis but also very near context where this slope has to be used
Sudden refactorings and iterations over model create issues back and forth, you refactor little scope code and it affects late visualizations. The modelling is not 1-shot, a thoughtful user iterates over the model and the easier it is the better. The verbose approach saves from thinking twice about naming and also saves from opportunistic coding but unexpected errors

I should stress that the magic used here is not unheard of for even fairly popular Python packages https://lukeplant.me.uk/blog/posts/pythons-disappointing-superpowers/

What's the risk with a future version of python?

I don't think this PR is ready to merge if it can't be used in a uniform way across the codebase.

How common is this short name vs long name pattern in code? It feels if anything this feature encourages users to us descriptive variable names.

In either case, I appreciate the concerns you have shared

fonnesbeck · 2023-06-22T21:25:16Z

I wonder if its any more magical that what we currently do with the abuse of the Python context manager for adding variables to the model? This seems to be along the same lines, in terms of non-explicit behavior, except that we are giving variables names at the same time that we are having a graph built for us. I'm not yet sold myself, but it has the potential to be a major convenience with relatively little risk.

zaxtax force-pushed the automatic_naming branch from 7005ec9 to c6df184 Compare December 3, 2022 05:26

ricardoV94 reviewed Dec 3, 2022

View reviewed changes

pymc/distributions/distribution.py Show resolved Hide resolved

ricardoV94 reviewed Dec 3, 2022

View reviewed changes

pymc/tests/distributions/test_distribution.py Outdated Show resolved Hide resolved

zaxtax force-pushed the automatic_naming branch from c6df184 to 4b592b1 Compare December 3, 2022 06:21

ricardoV94 added major Include in major changes release notes section request discussion labels Dec 3, 2022

michaelosthege reviewed Dec 3, 2022

View reviewed changes

Add support for automatic naming of random variables

38ef6cc

zaxtax force-pushed the automatic_naming branch from 4b592b1 to 38ef6cc Compare December 3, 2022 15:31

Provide more informative error message

97b1f58

zaxtax force-pushed the automatic_naming branch from 68e3796 to 97b1f58 Compare December 3, 2022 18:03

michaelosthege reviewed Dec 3, 2022

View reviewed changes

ricardoV94 requested changes Dec 4, 2022

View reviewed changes

ccaprani reviewed Dec 5, 2022

View reviewed changes

Update ZeroInflated distribution

e835fa0

ricardoV94 closed this Jun 16, 2023

ricardoV94 reopened this Jun 16, 2023

zaxtax marked this pull request as draft June 16, 2023 09:29

		@@ -164,6 +165,45 @@ def fn(args, *kwargs):
		return fn


		# Helper function from pyprob

Uh oh!

Add support for automatic naming of random variables #6364

Are you sure you want to change the base?

Add support for automatic naming of random variables #6364

Uh oh!

Conversation

zaxtax commented Dec 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Dec 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ricardoV94 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ricardoV94 Dec 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ricardoV94 commented Dec 3, 2022

Uh oh!

twiecki commented Dec 3, 2022

Uh oh!

michaelosthege commented Dec 3, 2022

Uh oh!

zaxtax commented Dec 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

michaelosthege Dec 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zaxtax Dec 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fonnesbeck commented Dec 3, 2022

Uh oh!

michaelosthege left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aseyboldt commented Dec 4, 2022

Uh oh!

ricardoV94 commented Dec 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ricardoV94 commented Dec 4, 2022

Uh oh!

aseyboldt commented Dec 4, 2022 via email • edited by ricardoV94 Loading Uh oh! There was an error while loading. Please reload this page.

zaxtax commented Dec 3, 2022 •

edited

Loading

codecov bot commented Dec 3, 2022 •

edited

Loading

ricardoV94 Dec 3, 2022 •

edited

Loading

zaxtax commented Dec 3, 2022 •

edited

Loading

michaelosthege Dec 3, 2022 •

edited

Loading

zaxtax Dec 3, 2022 •

edited

Loading

michaelosthege left a comment •

edited

Loading

ricardoV94 commented Dec 4, 2022 •

edited

Loading

aseyboldt commented Dec 4, 2022 via email •

edited by ricardoV94

Loading

ricardoV94 commented Dec 4, 2022 •

edited

Loading

ricardoV94 left a comment •

edited

Loading

ricardoV94 commented Dec 4, 2022 •

edited

Loading

zaxtax commented Dec 4, 2022 •

edited

Loading

drbenvincent commented Dec 4, 2022 •

edited

Loading

fonnesbeck commented Dec 5, 2022 •

edited

Loading

michaelosthege commented Dec 5, 2022 •

edited

Loading

ricardoV94 commented Dec 6, 2022 •

edited

Loading

fonnesbeck commented Dec 6, 2022 •

edited

Loading